What is tar-stream?
The tar-stream npm package is a streaming tar parser and generator, which allows users to read and write tar archives in a streaming fashion. This means that you can process tar files without having to load the entire file into memory, which is useful for handling large files or for streaming applications.
What are tar-stream's main functionalities?
Extracting a tar archive
This feature allows you to extract files from a tar archive. The 'entry' event is emitted for each file in the archive, providing the file header and a stream for the file content.
const extract = require('tar-stream').extract;
const fs = require('fs');
let extractor = extract();
extractor.on('entry', (header, stream, next) => {
// header is the tar header
// stream is the content body (might be an empty stream)
// call next when you are done with this entry
stream.on('end', () => next());
stream.resume(); // just auto drain the stream
});
fs.createReadStream('archive.tar').pipe(extractor);
Creating a tar archive
This feature allows you to create a tar archive. You can add entries to the archive with the 'entry' method, and then finalize the archive when you are done.
const pack = require('tar-stream').pack;
const fs = require('fs');
let packer = pack();
// add a file called my-test.txt with the content 'Hello World!'
packer.entry({ name: 'my-test.txt' }, 'Hello World!', (err) => {
if (err) throw err;
packer.finalize(); // finalize the archive when you are done
});
// pipe the pack stream somewhere, like to a file
packer.pipe(fs.createWriteStream('my-tarball.tar'));
Other packages similar to tar-stream
archiver
Archiver is a high-level streaming archive library that supports creating TAR and ZIP archives. It provides more abstraction than tar-stream and includes additional features like appending files from streams, buffers, or directories, and setting global archive options.
tar-fs
tar-fs is a Node.js module that provides filesystem bindings for tar-stream. It allows you to pack directories into tarballs and extract tarballs into directories using a file system interface, making it a bit more convenient for certain use cases compared to the lower-level tar-stream.
tar
The 'tar' package is a full-featured Tar for Node.js, which includes utilities for creating, manipulating, and extracting tar archives. It's a higher-level package compared to tar-stream and includes features like gzip compression and decompression.
tar-stream
tar-stream is a streaming tar parser and generator and nothing else. It operates purely using streams which means you can easily extract/parse tarballs without ever hitting the file system.
Note that you still need to gunzip your data if you have a .tar.gz
. We recommend using gunzip-maybe in conjunction with this.
npm install tar-stream
Usage
tar-stream exposes two streams, pack which creates tarballs and extract which extracts tarballs. To modify an existing tarball use both.
It implementes USTAR with additional support for pax extended headers. It should be compatible with all popular tar distributions out there (gnutar, bsdtar etc)
Related
If you want to pack/unpack directories on the file system check out tar-fs which provides file system bindings to this module.
Packing
To create a pack stream use tar.pack()
and call pack.entry(header, [callback])
to add tar entries.
const tar = require('tar-stream')
const pack = tar.pack()
pack.entry({ name: 'my-test.txt' }, 'Hello World!')
const entry = pack.entry({ name: 'my-stream-test.txt', size: 11 }, function(err) {
pack.finalize()
})
entry.write('hello')
entry.write(' ')
entry.write('world')
entry.end()
pack.pipe(process.stdout)
To extract a stream use tar.extract()
and listen for extract.on('entry', (header, stream, next) )
const extract = tar.extract()
extract.on('entry', function (header, stream, next) {
stream.on('end', function () {
next()
})
stream.resume()
})
extract.on('finish', function () {
})
pack.pipe(extract)
The tar archive is streamed sequentially, meaning you must drain each entry's stream as you get them or else the main extract stream will receive backpressure and stop reading.
The extraction stream in addition to being a writable stream is also an async iterator
const extract = tar.extract()
someStream.pipe(extract)
for await (const entry of extract) {
entry.header
entry.resume()
}
The header object using in entry
should contain the following properties.
Most of these values can be found by stat'ing a file.
{
name: 'path/to/this/entry.txt',
size: 1314,
mode: 0o644,
mtime: new Date(),
type: 'file',
linkname: 'path',
uid: 0,
gid: 0,
uname: 'maf',
gname: 'staff',
devmajor: 0,
devminor: 0
}
Modifying existing tarballs
Using tar-stream it is easy to rewrite paths / change modes etc in an existing tarball.
const extract = tar.extract()
const pack = tar.pack()
const path = require('path')
extract.on('entry', function (header, stream, callback) {
header.name = path.join('tmp', header.name)
stream.pipe(pack.entry(header, callback))
})
extract.on('finish', function () {
pack.finalize()
})
oldTarballStream.pipe(extract)
pack.pipe(newTarballStream)
Saving tarball to fs
const fs = require('fs')
const tar = require('tar-stream')
const pack = tar.pack()
const path = 'YourTarBall.tar'
const yourTarball = fs.createWriteStream(path)
pack.entry({ name: 'YourFile.txt' }, 'Hello World!', function (err) {
if (err) throw err
pack.finalize()
})
pack.pipe(yourTarball)
yourTarball.on('close', function () {
console.log(path + ' has been written')
fs.stat(path, function(err, stats) {
if (err) throw err
console.log(stats)
console.log('Got file info successfully!')
})
})
Performance
See tar-fs for a performance comparison with node-tar
License
MIT